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1.  INTRODUCTION 

Some  of  the  most  important  and  useful  tools  of  martingale  theory 
are  the  inequalities  bounding  tail -probabilities  for  the  supremum  of  a 
(sub-)  martingale  X(  )  over  a  time-interval  [0,T]  in  terms  of 
expectations  involving  X(*)  and  related  stochastic  processes  evaluated 
at  the  single  time-endpoint  T  .  The  most  famous  inequality  of  this  type 
is  Doob’s  (1953,  Theorem  3.4);  another,  less  less  well-known  but  more 
general,  is  due  to  Lenglart  (1977);  see  Burkholder  (1973)  for  other 
"distribution  function  inequalities"  of  this  type.  The  submartingale 
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maximal  inequality  of  Bimbaum  and  Marshall  (1961,  Theorem  5.1)  is 
a  closely  related  result,  which  Slud  (1986)  has  recently  generalized 
and  shown  to  follow  from  Lenglart’s  result.  The  present  paper  develops 
an  exponential -bound  result  for  continuous-time  martingales,  which  in 
many  examples  is  much  more  inform  'ive  than  the  previously  mentioned 
inequalities  but  has  the  disadvantage  that  it  applies  only  to  martingales, 
not  submartingales. 


2.  EXPONENTIAL  BOUNDS  FOR  MARTINGALES 


The  present  Section  develops  an  exponential  inequality 
generalizing  to  continuous-time  martingales  Kolmogorov’s  famous  upper 
exponential  bound  [Loeve,  1955,  pp.  254-5]  for  sums  of  uniformly 
bounded  independent  summands.  The  inequality  is  due  [in  the  discrete¬ 
time  martingale  setting]  to  Steiger  (1969)  and  was  re-proved  and 
exploited  by  Freedman  (1975,  pp.  102-4).  The  following  restatement  of 
Freedman’s  version  is  given  here  without  proof. 

Proposition.  Let  ML)  be  a  {Fj-adapted  martingale  on  the 
probability  space  (Q,F,P)  with  parameter-set  =  [0,oo),  and  assume 

that  ML)  is  a.s.  in  D[0,t]  as  a  random  function  for  each  t  <  co. 
Also  let  (t.:  O^i^L}  be  a  nondecreasing  sequence  of  stopping  times 
with  t0  =  0  ,  such  that  for  each  i=l,2,...,L  and  a  finite  constant  K  , 
|M(t.)  -M(t._j)|  a.s.  Then  for  all  a  and  {$  >  0  , 


M(t.)^or  and  2  E{  M  (t.)-M  (t . 
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In  stating  the  foregoing  Proposition,  we  used  the  idea  of 
conditioning  on  past  history  F  up  to  a  stopping  time  r  .  The 
definition  is 

F  =  ct(  A  :  for  te[0,oo)  ,  AH^tj  e  F  )  . 

See  Liptser  and  Shiryaev  (1977,  vol.  1,  pp.  25-29)  for  further 
background  concerning  cr-fields  F  .  The  importance  for  us  of  parti¬ 
tioning  the  interval  [0,T)  by  means  of  increasing  stopping-time 
sequences  {t  }  is  that  the  uniform  bounds  on  |M(t.)-M(t._j )  |  are  not 
very  restrictive  when  the  times  t.  are  allowed  to  be  random. 


We  next  restrict  the  continuous-time  martingales  Mp)  under 
consideration  to  have  calculable  variance-processes  (cf.  Brown  1978; 
Helland  1982)  in  the  following  strong  sense: 

we  assume  for  any  nested  increasing  sequence  of  partitions 
Q(k)  =  {LjJj  of  [0,oo)  consisting  of  a.s.  nondecreasing  sequences  of 

(FJ  stopping  times  t.^  satisfying  (t0k  '  0  and) 

2  ^ 

(i)  E  M  (tjj^)  <  oo  for  each  j  and  k  , 

(ii)  mox{j:  t^^  —  t  }  <  oo  a.s.  for  each  k  and  each  t  <  oo  , 


(iii)  mesh(Q(k))  =  wax  ^  -t^) 

i 

that  for  each  t  the  L  -limit 


0  as  k  — ►  co  , 


V(t)  =  Urn  V.  (t) 

k— ►oo 


kx>  j:tjk<t 


E(  (M(rt.+ljk)-M((tjk))' 
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exists.  When  we  discuss  local  martingales  Mp)  ,  we  implicitly 
restrict  attention  to  D[0,oo)  processes  for  which  some  increasing 


sequences  (  rn  1  of  stopping-times  yield  martingales  ML't  )  with 
calculable  variance-processes. 

The  special  class  of  martingales  which  have  calculable  variance 
processes  according  to  the  foregoing  definition  is  well  known  (Brown 
1978;  Jacod  1979;  Slud  198/)  to  include  all  continuous- path 
martingales  and  martingales  whose  squares  are  "quasi- left-continuous" 
(i.e.,  have  a.s.  continuous  Doob-Meyer  compensators)  ;  all  martingales 
adapted  to  a  a-field  family  F  =  F0  V  o{  N(s)  :  0<s^t  )  where  N(-) 
is  a  simple  multivariate  counting-process  ;  and  all  (finite  sums  of) 
stochastic  integrals  of  predictable  processes  of  the  preceding  types. 
Therefore,  although  not  all  square- integrable  martingales  have 
calculable  variance-processes  (see  the  discussion  of  Helland  1982),  the 
class  of  processes  which  do  seems  to  be  quite  large  enough  for  most 
applications. 

An  important  feature  of  the  inequality  (2.1)  is  that  the  upper 
bound  does  not  depend  on  L  .  Therefore  a  limit,  can  be  taken  over  a 
sequence  of  partitions  Q(k)  satisfying  (i)-(iii)  as  above.  The  result 
obtained  in  this  way  seems  to  be  one  of  the  first  in  which  the  concept  of 
calculable  variance  plays  a  crucial  role. 


Theorem  2.1.  Let  ML)  be  a  (FJ -adapted  locally  square-integrable 
martingale  in  D[0,T)  with  calculable  variance-process,  and  let  r  be  a 
stopping-time  <  T  a.s.  Then  for  the  calculable  variance-process  V(-) 
of  ML)  and  any  or  ,  0  >  0  , 


M(t)>cr  and  V(t)^/?  for  some  te[0,r]  ►  £  e 


A  /(Ka+0) 


(2.2) 


where  K  =  ess. sup.  sup  { ]  AM (t)  |:  O^t^r}  . 
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Proof.  For  arbitrarily  small  6  >  0  ,  define  the  increasir 

sequence  { o^}  of  (FJ  stopping-times  by  Oq  =  0  and 


a  =  o  =  r 
n  n 


noin {  t  >  <7n^:  |M(t)-M(o^)  |  >  6  } 


Right-continuity  of  Mp)  implies  that  such  a  sequence  exists  and 
increases  a.s.  to  r  ,  and  that 

sup  {  |M(t)-M(<7^)  |  :  ^ t  <  %  }  -  <5  a.s.  (2.3) 

Now  let  Q(k)  =  (  tj^  be  nested  increasing  random  partitions  of 

[0,T)  by  stopping  times,  such  that 


for  each  k^l  ,  {  a*  }  C  Q( k)  and  (i)-(iii)  hold  (2 A) 

Then  by  construction,  for  every  k^l, 

max  |M(h+1  k)-M(hk)  |  ^  K  +  k  a.s. 
d 

Impose  for  this  paragraph  the  auxiliary  assumption  that  Mp) 
itself  is  a  square-integrable  martingale.  It  is  not  hard  to  show  from  the 
calculable-variance  property  of  Mp)  that 

sup  I  2  E{  mVi ; ,  ,  .  )-mVi  U  | F.  }  -  V(s)  I  £+  0  (2.5) 

0<s<T  j  J+1’K  JK  jk 

[To  see  this,  observe  first  that  the  processes  V  (T-V^P)  for  m^k 

are  each  { F  }  martingales  for  the  time-index  s  in  Q(k)  =  {t . , } .  U(t) , 
lik  ik  1 

so  that  by  Doob’s  inequality,  for  each  t  and  each  c  >  0 
P(  max  |Vm(t“tik)-Vk(t'tik)  |  *  c  }  <  c"  £|Vm(t)-Vk(t)  | 


which  converges  to  0  as  k  ,  m  go  to  oo,  by  the  calculable-variance 

assumption.  Since  it  is  easy  to  check  from  (2.3)  and  (2.4)  that 

-2 

SUP\  '  Vrn^ik^:  hk^^i+l  k  *  1  ^  k  ,  which  converges  in 

probability  to  0  as  k— ►co  ,  the  assertion  (2.5)  follows  from  the  a.s. 
monotonicity  of  V(-),  the  property  (iii),  and  the  fact  that  a.s.  (s)  > 
Vk(t  k)  whenever  s  £  t  .  ] 

The  Proposition  of  this  Section  applied  to  the  martingale  Mp^r  ) 
with  calculable  variance-process  yields  for  each  fixed  integer  n  and 
each  constant  y  >  0  , 
f 

P-  for  some  i^l  ,  rn)  ^  ar  y  and 


.2  n  Ndj+l.k^n’^^jk'V1  I  Ftjk)  ) 

S  exp{  -  i  (a-y)  /[(K+y)  (a-y)  +  0  +  y]  } 


(2.6) 


But  (2.5)  together  with  right-continuity  of  Mp)  implies  for  each  n  , 

r 

P-  [  Mlt^r  )  2  a  for  some  t  with  V (t "r^)  ^  ]  \  [  for  some  i  , 

M(Vrn>  S  a  y  and  j?  £{  M2(tj+1(k)-M2(tjkl  |  F  )  S  P+y  ]  ; 


0  as  k 


(2.7) 


Combining  (2.6)  and  (2.7)  and  letting  k— Kd  gives 
P-  MpCr^)  ^  a  for  some  t  with  V(t~rn)  - 

^  exp{-  i  (a-y)  /[(K+y)(a-y)+/?+y]  } 


Finally,  let  n— ►co  and  y— ►O  to  complete  the  proof  of  (2.1).  □ 

As  a  first  illustration  of  Theorem  2.1,  consider  the  case  of 
Wiener  process  M(‘)  E  W(-)  on  [0,T]  ,  where  T<co  .  The  variance 
process  V( *)  for  W(-)  ,  or  equivalently  the  compensator  for  W  (•)  ,  is 
simply  V (t)  =  t  ;  and  of  course,  continuity  of  W(')  implies  that  the 
number  K  in  Theorem  2.1  is  0.  Therefore  Theorem  2.1  says  in  this 
context  that 

2 

pf  sup  W(t)  id  )  <  e'  *  0  /T  (2.8) 

0it<T  -1 

Of  course,  more  exact  information  exists  about  the  probability 

distribution  of  sup  W(t)  [see  Feller,  1971,  vol.  2,  pp.  340-1,  or 
te[0,T] 

Karlin  and  Taylor,  1975,  pp.  345-7,  where  it  is  shown  that  the  left-hand 

side  of  (2.8)  is  exactly  equal  to  2,fl-$(cr/T^)}  J  ,  but  the  Theorem  gets 
the  correct  order  of  magnitude  for  the  logarithm  of  the  tail-probability 
for  large  a. 

Thus  Theorem  2.1  can  be  thought  of  as  a  generalization  of  the 

known  distributional  bound  (2.8)  for  the  supremum  of  a  Wiener  process, 

controlling  the  supremum  of  a  general  local  martingale  in  terms  of  the 

inirinsic  time-scale  given  by  its  variance-process.  Let  us  consider  as  a 

second  application  of  the  Theorem  the  case  M ( •)  =  N(*)  -  A(-)  ,  where 

N  is  a  simple  counting-process  on  [0 ,co)  ,  and  A(-)  is  its 

compensator  with  respect  to  a  a-field  family  =  F0  V  a(N(u) :  u^t) 

2 

Then  the  variance-process  V(-)  for  N(-)  [the  compensator  for  M  (•)] 
is  described  by  Liptser  and  Shiryaev  (1977,  vol.  2,  Theorem  18.2  ) 
and  has  the  property  V(‘)  ^  Ab)  a.s.,  with  a.s.  equality  in  case  all  the 


conditional  distributions  of  times  Tr+^-T  between  successive  jumps 
given  Fj  are  nonatomic  a.s.  Now  the  quantity  K  in  the  Theorem  is  1. 


Take  or  =  c(3  ,  and  apply  Theorem  2.1  to  conclude 
pj  |N(t)-A(t)  j  ^  cjS  for  some  t  with  V (t)  ^  jS  |  ^  2  e 


{  c  0/(c+l 


3.  APPLICATIONS  TO  SOLUTIONS  OF  STOCHASTIC  EQUATIONS 


We  apply  Theorem  2.1  next  in  a  statistical  setting:  consider  a 
finite  population  of  n  individuals,  each  member  of  which  comes 
equipped  with  a  latent  random  survival-time  X.  and  with  a  left- 

continuous  {0,1} -valued  process  r.(-)  on  [0 ,co)  which  indicates  at  time 

t  whether  the  death  of  individual  i  at  time  t  would  be  observable. 
Let  N.(t)  a  ^[X.<t]Ti^  indicate  whether  the  death  of  i  is  observed 

by  time  t  ;  define  F^  a  <j(  N.(s),  r.(s)  :  O^s^t  ,  i—  1 , . . ,n  ),  and  assume 
that 


for  each  i,  N  (t)  -  r.  (s)  dH(s)  is  a  {F  } -martingale  (3.1) 

1  jQ  i  1 

where  H(-)  is  some  nonrandom  continuous  nondecreasing  function  on 
[0,oo)  ,  not  depending  on  i  ,  such  that  H(0)=0  and  H(oo)=oo  .  The 
statistical  purpose  of  observing  N.  and  r.  is  to  estimate  the 

distribution  function  F ( • )  =  i  -  exp{-H(-)}  uniquely  associated  with  tPie 
cumulative  hazard  function  H(  )  ,  that  is  ,  to  produce  a  {F^} -adapted 

a 

functional  F(t)  of  {  N.  ( •) ,  r.  ( *)  }.  which  is  close  to  F (t )  (uniformly 
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for  all  t,  if  possible)  when  n  is  large  and  no  other  assumption  than 
(3.1)  is  made.  It  is  well  known  that  the  product- Urn  it  or  Kaplan-Meier 


estimator  F,  which  can  be  defined  through  the  stochastic  equatio 

(Gill  1983) 

A 


on 


7  /♦'  =  Ht)-F(t)  _ 

nvl/  =  1  -  F(t)  = 


0 


(1-Zn(u-)) 


dN  ( u)-  r_(  u)dH(u) 
r(u) 


(3.2) 


where 


n  n 

N(t)  =N  (t)  =  2  N.(t)  and  rn(t)  =  r(t)  s  2  r.  (t) 
i=i  1  i=i  1 


has  excellent  properties  in  this  regard.  Note  that  while  the  unique 
locally  bound 

j 

to  check  that 


locally  bounded  solution  Z  (•)  of  (3.2)  does  depend  on  F,  it  is  easy 


1  -  F(t)  - 


s^t:  AN(s)>0 


<  i  .  AN(s)  i 
^  r(s)  ‘ 


does  not.  We  do  not  motivate  the  estimator  F  here  apart  from  the 
remark  that  it  coincides  with  the  usual  empirical  distribution  function 
in  the  special  case  when  r.(t)  =  1^  for  all  i  (he.,  in  the  case 

where  all  the  X.  can  be  observed).  Our  purpose  is  to  show  that 
exponential  (in  n)  bounds  on  tail-probabilities  for 
sup[  | Z  (t)  |  :  O^t^T  }  can  be  simply  derived  via  Theorem  2.1. 

It  is  easy  to  check  that  the  martingales  (3.1)  and  therefore  also 
the  martingales  (3.2)  have  calculable  variance-processes.  By  standard 
theorems  on  stochastic  integrals  ,  on  the  event  [  rn(s)>0  for  all  s^t  ] 


<zn>  (t) 


1  2  -1 

[1  -Z  (u-)j  [rju)]  dH(u) 

0 


n 


n 


Noting  that  H(u)^C  implies  F(u)  ^  i-e~^  and  |1-Z  (u-)|  ^  e' 


we 


.v* 
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find  for  the  stopping-time  =  sup{  t:  H (t ) <C  and  r  |t)>n/C  } 

defined  in  terms  of  an  arbitrary  but  fixed  constant  C  >  0  ,  that  a.s. 

<Z  >(a  C  n  =  fi  sp  and  sup  jAZ  (t;|<L'e('  n  =K  . 

‘  n  0<t<c7  n  n 

n 

By  Theorem  2.1  , 

P[  sup  |Z  It)  |  £  x  }  ^  2  exp  (  -  \  D  n  x  /(I  +  x  )  }  (3.3) 

G<t<a  n 
n 

~2  -2C 

where  D  =  C  e  .In  the  special  case  where  r.(')  =  1  for  all  i  , 
the  result  (3.3)  gives  an  upper  bound  related  to  bounds  of  Hoeffding 
(1963);  incase  r.(t.)  s  I,  y  )>n  »  where  the  random  variables 

{Y.}.  form  an  i.i.d.  sequence  independent  of  the  i.i.d.  sequence  {X.}.  , 

the  result  (3.3)  yields  exponential  bounds  derived  by  Foldes  and  Rejt.c 

«*  «*  ^ 

(1981)  and  by  Csorgo  and  Horvath  (1983)  .  See  Slud  (1987)  for 
further  discussion  of  the  bearing  of  (3.3)  on  Estimation  in  Survival 
Analysis,  as  well  as  of  applications  of  the  Theorem  in  bounding 
probabilities  of  large  deviations  for  compound-renewal  processes. 

Another  arena  of  possible  application  of  Theorem  2.1  is  the  study 
of  hitting-  and  occupation-times  for  the  solutions  of  stochastic 
differential  equations.  Such  applications  apparently  depend  heavily  on 
detailed  estimates  for  solutions  of  associated  parabolic  partial 
differential  equations  .  For  illustration,  we  sketch  here  an  application 
to  large-deviation  estimates  for  hitting  times  of  d-vector  Wiener 

d 

process  W(-)  .  Let  D  denote  a  closed  domain  in  R  x[0,co)  . 
containing  (0,0)  and  with  a  smooth  boundary.  Define  for  each 


(x,t)  e  R  x[0,oo)  ,  conditionally  given  W(t)=x  ,  the  stopping-time 


r  =  r(x  t)  ~  {  s>0;  (W(t+s),t+s)  e  0D  } 


Then  for  any  piecewise-smooth  function  f(x,t)  ,  Ito’s  Lernma  says  for 


rt 


3f, 


M(t)  =  f(W(t),t)  -  f (0,0)  -  {  xr(W(s),s)  +  i  Af(W(s),s)  }  ds  (3.4) 

jQ  ai 

w 

that  M(,Ar)  is  a  martingale  with  respect  to  the  a-field  family  F 

generated  by  (W(s):0^s^t)  ,  where  A  denotes  the  Laplacian  on  . 
Moreover,  the  variance-process  <M>  is  calculable  since  f(W (•),')  is 
continuous,  and  (if  we  use  V  to  denote  gradient  in  x-variables) 


<M>  (t)  = 


4. 


0 


II  Vf(W(s),s)  ||  ds 


The  particular  choice  of  function  f(x,t)  =  {  r^x  ^  A  (T-t)  }  for 

0<t<T,  where  T  >  0  is  fixed,  is  readily  seen  to  solve  the  Partial 
Differential  Equation 


3f 


^  +  iAf  =  -1 
f  =  0 


on  D 

on  8D  U  {  (x,T)  :  x  e  R^  } 


( X  1 ) 

Here  we  have  adopted  the  standard  notation  E  ’  to  indicate  that 
expectations  are  taken  conditionally  given  W(t)=x  .  Now,  if  we  let 


ti 


L(t)  =  I[,w(s)  s j  e  Qj  3s  denote  the  total  occupation-time  for  D 

by  W(-)  up  to  t  ,  then  Theorem  2A  applied  to  the  martingale  M (  ) 
up  to  stopping-time  i^q  says 


(3.5) 


P{  f(W(t),t)  -  f (0,0)  +  L(t)  ^  or  for  some  t  ^  r^T  satisfying 
i  2 

l|Vf(W(s),s)  ||2  ds  <  /?  }  <  e  /P 

^0 

Effective  application  of  (3.5)  would  require  a  good  bound  on 
It Vf ||  .  For  instance,  if  we  could  show  directly  or  via  a  comparison 

method  that  ||Vf(x,t)||  ^  C  for  all (x,t)  e  D  and  for  all  T  ,  then  (3.5) 
implies  that 

P(0’0){  L(r)  >  a  +  £(0>0)r  and  r"T  <  /9/C  }  <  e^a/^ 

By  letting  T  increase  to  co  ,  we  would  then  conclude  that 

p(°,0)(  L(r)  >  a  +  El0,0)r  and  r  <  0/C  }  <  e"  *  a  /P  . 
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